Mining Maximal Frequent Subtrees based on Fusion Compression and FP-tree
نویسندگان
چکیده
It is commonly accepted that mining frequent subtrees play pivotal roles in areas like Web log analysis, XML document analysis, semi-structured data analysis, as well as biometric information analysis, chemical compound structure analysis, etc. An improved algorithm, i.e. MFPTM algorithm, which based on fusion compression and FP-tree principle, was proposed in this paper to determine a better way to mine maximal frequent subtrees. The algorithm firstly retains subtrees which only contain frequent nodes by fusion compression, then according to FP-tree principle mines frequent subtrees. In the process of mining frequent subtrees, MFPTM algorithm is the means by which we attempt to satisfy our appetite for saving searching space of mining candidate patterns, and our craving to solve problems of frequent pattern mining based on Apriori algorithm which is generating a large quantity of candidate patterns. MFPTM algorithm, which actively represents as many viewpoints as is both possible and feasible as an advanced algorithm, improves the efficiency of mining frequent subtrees.
منابع مشابه
CMTreeMiner: Mining Both Closed and Maximal Frequent Subtrees
Tree structures are used extensively in domains such as computational biology, pattern recognition, XML databases, computer networks, and so on. One important problem in mining databases of trees is to find frequently occurring subtrees. However, because of the combinatorial explosion, the number of frequent subtrees usually grows exponentially with the size of the subtrees. In this paper, we p...
متن کاملFast Extraction of Maximal Frequent Subtrees Using Bits Representation
With the continuous growth in XML data sources over the Internet, the discovery of useful information from a collection of XML documents is currently one of the main research areas occupying the data mining community. The most commonly adopted approach to this task is to extract frequently occurring subtree patterns from XML trees. But, the number of frequent subtrees usually grows exponentiall...
متن کاملEfficient Data Mining for Maximal Frequent Subtrees
A new type of tree mining is defined in this paper, which uncovers maximal frequent induced subtrees from a database of unordered labeled trees. A novel algorithm, PathJoin, is proposed. The algorithm uses a compact data structure, FST-Forest, which compresses the trees and still keeps the original tree structure. PathJoin generates candidate subtrees by joining the frequent paths in FST-Forest...
متن کاملSmart frequent itemsets mining algorithm based on FP-tree and DIFFset data structures
Association rule data mining is an important technique for finding important relationships in large datasets. Several frequent itemsets mining techniques have been proposed using a prefix-tree structure, FP-tree, a compressed data structure for database representation. The DIFFset data structure has also been shown to significantly reduce the run time and memory utilization of some data mining ...
متن کاملDiscovering Frequent Embedded Subtree Patterns from Large Databases of Unordered Labeled Trees
Recent years have witnessed a surge of research interest in knowledge discovery from data domains with complex structures, such as trees and graphs. In this paper, we address the problem of mining maximal frequent embedded subtrees which is motivated by such important applications as mining “hot” spots of Web sites from Web usage logs and discovering significant “deep” structures from tree-like...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2011